Dataset statistics
| Number of variables | 14 |
|---|---|
| Number of observations | 2227577 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 178.4 MiB |
| Average record size in memory | 84.0 B |
Variable types
| Numeric | 8 |
|---|---|
| Categorical | 2 |
| Boolean | 4 |
Rating is highly correlated with Rating Count | High correlation |
Rating Count is highly correlated with Maximum Installs | High correlation |
Installs is highly correlated with Maximum Installs | High correlation |
Maximum Installs is highly correlated with Rating Count and 1 other fields | High correlation |
Free is highly correlated with Price | High correlation |
Price is highly correlated with Free | High correlation |
Category is highly correlated with Content Rating and 1 other fields | High correlation |
Content Rating is highly correlated with Category | High correlation |
Ad Supported is highly correlated with Category | High correlation |
Rating Count is highly skewed (γ1 = 261.321718) | Skewed |
Installs is highly skewed (γ1 = 185.3775748) | Skewed |
Maximum Installs is highly skewed (γ1 = 169.9392771) | Skewed |
Price is highly skewed (γ1 = 99.08154382) | Skewed |
df_index is uniformly distributed | Uniform |
df_index has unique values | Unique |
Rating has 1041090 (46.7%) zeros | Zeros |
Rating Count has 1041090 (46.7%) zeros | Zeros |
Price has 2185451 (98.1%) zeros | Zeros |
Reproduction
| Analysis started | 2022-10-29 19:05:35.861483 |
|---|---|
| Analysis finished | 2022-10-29 19:08:03.326488 |
| Duration | 2 minutes and 27.47 seconds |
| Software version | pandas-profiling v3.4.0 |
| Download configuration | config.json |
| Distinct | 2227577 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1156663.919 |
| Minimum | 0 |
|---|---|
| Maximum | 2312943 |
| Zeros | 1 |
| Zeros (%) | < 0.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 17.0 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 116074.8 |
| Q1 | 578364 |
| median | 1156444 |
| Q3 | 1734870 |
| 95-th percentile | 2197363.2 |
| Maximum | 2312943 |
| Range | 2312943 |
| Interquartile range (IQR) | 1156506 |
Descriptive statistics
| Standard deviation | 667627.6839 |
|---|---|
| Coefficient of variation (CV) | 0.5772010978 |
| Kurtosis | -1.200549236 |
| Mean | 1156663.919 |
| Median Absolute Deviation (MAD) | 578252 |
| Skewness | 9.175538382 × 10-5 |
| Sum | 2.576557943 × 1012 |
| Variance | 4.457267242 × 1011 |
| Monotonicity | Strictly increasing |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0 | 1 | < 0.1% |
| 1542287 | 1 | < 0.1% |
| 1542301 | 1 | < 0.1% |
| 1542300 | 1 | < 0.1% |
| 1542299 | 1 | < 0.1% |
| 1542298 | 1 | < 0.1% |
| 1542297 | 1 | < 0.1% |
| 1542296 | 1 | < 0.1% |
| 1542295 | 1 | < 0.1% |
| 1542294 | 1 | < 0.1% |
| Other values (2227567) | 2227567 |
| Value | Count | Frequency (%) |
| 0 | 1 | |
| 1 | 1 | |
| 2 | 1 | |
| 3 | 1 | |
| 4 | 1 | |
| 5 | 1 | |
| 6 | 1 | |
| 7 | 1 | |
| 8 | 1 | |
| 9 | 1 |
| Value | Count | Frequency (%) |
| 2312943 | 1 | |
| 2312942 | 1 | |
| 2312941 | 1 | |
| 2312940 | 1 | |
| 2312939 | 1 | |
| 2312938 | 1 | |
| 2312937 | 1 | |
| 2312936 | 1 | |
| 2312935 | 1 | |
| 2312934 | 1 |
| Distinct | 48 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 17.0 MiB |
| Education | |
|---|---|
| Music & Audio | |
| Business | 138291 |
| Tools | 137839 |
| Entertainment | 134355 |
| Other values (43) |
Length
| Max length | 23 |
|---|---|
| Median length | 15 |
| Mean length | 10.4002672 |
| Min length | 4 |
Characters and Unicode
| Total characters | 23167396 |
|---|---|
| Distinct characters | 41 |
| Distinct categories | 4 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | Adventure |
|---|---|
| 2nd row | Tools |
| 3rd row | Productivity |
| 4th row | Communication |
| 5th row | Tools |
Common Values
| Value | Count | Frequency (%) |
| Education | 233858 | 10.5% |
| Music & Audio | 152493 | 6.8% |
| Business | 138291 | 6.2% |
| Tools | 137839 | 6.2% |
| Entertainment | 134355 | 6.0% |
| Lifestyle | 115415 | 5.2% |
| Books & Reference | 114621 | 5.1% |
| Personalization | 87506 | 3.9% |
| Health & Fitness | 80742 | 3.6% |
| Productivity | 75522 | 3.4% |
| Other values (38) | 956935 |
Length
Histogram of lengths of the category
| Value | Count | Frequency (%) |
| 618978 | 17.7% | |
| education | 233858 | 6.7% |
| music | 156515 | 4.5% |
| audio | 152493 | 4.4% |
| business | 138291 | 4.0% |
| tools | 137839 | 4.0% |
| entertainment | 134355 | 3.9% |
| lifestyle | 115415 | 3.3% |
| books | 114621 | 3.3% |
| reference | 114621 | 3.3% |
| Other values (52) | 1570980 |
Most occurring characters
| Value | Count | Frequency (%) |
| i | 2023801 | 8.7% |
| e | 1901848 | 8.2% |
| o | 1879000 | 8.1% |
| n | 1731829 | 7.5% |
| t | 1497767 | 6.5% |
| s | 1488549 | 6.4% |
| a | 1442402 | 6.2% |
| 1260389 | 5.4% | |
| u | 1003856 | 4.3% |
| c | 954038 | 4.1% |
| Other values (31) | 7983917 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 18419041 | |
| Uppercase Letter | 2868988 | 12.4% |
| Space Separator | 1260389 | 5.4% |
| Other Punctuation | 618978 | 2.7% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| i | 2023801 | |
| e | 1901848 | |
| o | 1879000 | |
| n | 1731829 | |
| t | 1497767 | |
| s | 1488549 | |
| a | 1442402 | 7.8% |
| u | 1003856 | 5.5% |
| c | 954038 | 5.2% |
| r | 811596 | 4.4% |
| Other values (13) | 3684355 |
Uppercase Letter
| Value | Count | Frequency (%) |
| E | 413878 | |
| A | 285955 | |
| B | 274425 | |
| P | 271457 | |
| M | 253462 | |
| F | 215734 | |
| T | 214105 | |
| S | 191427 | |
| L | 185231 | |
| R | 133367 | 4.6% |
| Other values (6) | 429947 |
Space Separator
| Value | Count | Frequency (%) |
| 1260389 |
Other Punctuation
| Value | Count | Frequency (%) |
| & | 618978 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 21288029 | |
| Common | 1879367 | 8.1% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| i | 2023801 | 9.5% |
| e | 1901848 | 8.9% |
| o | 1879000 | 8.8% |
| n | 1731829 | 8.1% |
| t | 1497767 | 7.0% |
| s | 1488549 | 7.0% |
| a | 1442402 | 6.8% |
| u | 1003856 | 4.7% |
| c | 954038 | 4.5% |
| r | 811596 | 3.8% |
| Other values (29) | 6553343 |
Common
| Value | Count | Frequency (%) |
| 1260389 | ||
| & | 618978 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 23167396 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| i | 2023801 | 8.7% |
| e | 1901848 | 8.2% |
| o | 1879000 | 8.1% |
| n | 1731829 | 7.5% |
| t | 1497767 | 6.5% |
| s | 1488549 | 6.4% |
| a | 1442402 | 6.2% |
| 1260389 | 5.4% | |
| u | 1003856 | 4.3% |
| c | 954038 | 4.1% |
| Other values (31) | 7983917 |
| Distinct | 42 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2.185477674 |
| Minimum | 0 |
|---|---|
| Maximum | 5 |
| Zeros | 1041090 |
| Zeros (%) | 46.7% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 17.0 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 2.9 |
| Q3 | 4.3 |
| 95-th percentile | 4.9 |
| Maximum | 5 |
| Range | 5 |
| Interquartile range (IQR) | 4.3 |
Descriptive statistics
| Standard deviation | 2.108288846 |
|---|---|
| Coefficient of variation (CV) | 0.9646810266 |
| Kurtosis | -1.860825924 |
| Mean | 2.185477674 |
| Median Absolute Deviation (MAD) | 2.1 |
| Skewness | 0.01530924985 |
| Sum | 4868319.8 |
| Variance | 4.444881857 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=42)
| Value | Count | Frequency (%) |
| 0 | 1041090 | |
| 5 | 97999 | 4.4% |
| 4.2 | 84568 | 3.8% |
| 4.4 | 83043 | 3.7% |
| 4.3 | 79955 | 3.6% |
| 4.6 | 75664 | 3.4% |
| 4.5 | 73993 | 3.3% |
| 4.1 | 66769 | 3.0% |
| 4 | 64579 | 2.9% |
| 4.7 | 60295 | 2.7% |
| Other values (32) | 499622 |
| Value | Count | Frequency (%) |
| 0 | 1041090 | |
| 1 | 704 | < 0.1% |
| 1.1 | 233 | < 0.1% |
| 1.2 | 514 | < 0.1% |
| 1.3 | 559 | < 0.1% |
| 1.4 | 974 | < 0.1% |
| 1.5 | 1128 | 0.1% |
| 1.6 | 1596 | 0.1% |
| 1.7 | 1864 | 0.1% |
| 1.8 | 2855 | 0.1% |
| Value | Count | Frequency (%) |
| 5 | 97999 | |
| 4.9 | 43474 | |
| 4.8 | 59538 | |
| 4.7 | 60295 | |
| 4.6 | 75664 | |
| 4.5 | 73993 | |
| 4.4 | 83043 | |
| 4.3 | 79955 | |
| 4.2 | 84568 | |
| 4.1 | 66769 |
| Distinct | 35394 |
|---|---|
| Distinct (%) | 1.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2005.649982 |
| Minimum | 0 |
|---|---|
| Maximum | 56025424 |
| Zeros | 1041090 |
| Zeros (%) | 46.7% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 17.0 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 6 |
| Q3 | 40 |
| 95-th percentile | 1244 |
| Maximum | 56025424 |
| Range | 56025424 |
| Interquartile range (IQR) | 40 |
Descriptive statistics
| Standard deviation | 88039.39824 |
|---|---|
| Coefficient of variation (CV) | 43.89569418 |
| Kurtosis | 112441.8315 |
| Mean | 2005.649982 |
| Median Absolute Deviation (MAD) | 6 |
| Skewness | 261.321718 |
| Sum | 4467739770 |
| Variance | 7750935643 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0 | 1041090 | |
| 5 | 62997 | 2.8% |
| 6 | 53130 | 2.4% |
| 7 | 45766 | 2.1% |
| 8 | 39674 | 1.8% |
| 9 | 35103 | 1.6% |
| 10 | 31548 | 1.4% |
| 11 | 28474 | 1.3% |
| 12 | 25483 | 1.1% |
| 13 | 23036 | 1.0% |
| Other values (35384) | 841276 |
| Value | Count | Frequency (%) |
| 0 | 1041090 | |
| 5 | 62997 | 2.8% |
| 6 | 53130 | 2.4% |
| 7 | 45766 | 2.1% |
| 8 | 39674 | 1.8% |
| 9 | 35103 | 1.6% |
| 10 | 31548 | 1.4% |
| 11 | 28474 | 1.3% |
| 12 | 25483 | 1.1% |
| 13 | 23036 | 1.0% |
| Value | Count | Frequency (%) |
| 56025424 | 1 | |
| 36446381 | 1 | |
| 31018623 | 1 | |
| 26860860 | 1 | |
| 26340056 | 1 | |
| 22148032 | 1 | |
| 21754025 | 1 | |
| 18956756 | 1 | |
| 18066559 | 1 | |
| 16835698 | 1 |
| Distinct | 20 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 108613.0744 |
| Minimum | 0 |
|---|---|
| Maximum | 1000000000 |
| Zeros | 11173 |
| Zeros (%) | 0.5% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 17.0 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 5 |
| Q1 | 50 |
| median | 500 |
| Q3 | 5000 |
| 95-th percentile | 100000 |
| Maximum | 1000000000 |
| Range | 1000000000 |
| Interquartile range (IQR) | 4950 |
Descriptive statistics
| Standard deviation | 3679195.179 |
|---|---|
| Coefficient of variation (CV) | 33.87433049 |
| Kurtosis | 44326.912 |
| Mean | 108613.0744 |
| Median Absolute Deviation (MAD) | 495 |
| Skewness | 185.3775748 |
| Sum | 2.419439865 × 1011 |
| Variance | 1.353647717 × 1013 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=20)
| Value | Count | Frequency (%) |
| 100 | 430857 | |
| 1000 | 386187 | |
| 10 | 289162 | |
| 10000 | 246901 | |
| 500 | 183387 | |
| 50 | 165273 | 7.4% |
| 5000 | 138748 | 6.2% |
| 100000 | 103887 | 4.7% |
| 50000 | 71690 | 3.2% |
| 5 | 70538 | 3.2% |
| Other values (10) | 140947 | 6.3% |
| Value | Count | Frequency (%) |
| 0 | 11173 | 0.5% |
| 1 | 62381 | 2.8% |
| 5 | 70538 | 3.2% |
| 10 | 289162 | |
| 50 | 165273 | 7.4% |
| 100 | 430857 | |
| 500 | 183387 | |
| 1000 | 386187 | |
| 5000 | 138748 | 6.2% |
| 10000 | 246901 |
| Value | Count | Frequency (%) |
| 1000000000 | 16 | < 0.1% |
| 500000000 | 33 | < 0.1% |
| 100000000 | 367 | < 0.1% |
| 50000000 | 622 | < 0.1% |
| 10000000 | 5229 | 0.2% |
| 5000000 | 5761 | 0.3% |
| 1000000 | 30396 | 1.4% |
| 500000 | 24969 | 1.1% |
| 100000 | 103887 | |
| 50000 | 71690 |
| Distinct | 238730 |
|---|---|
| Distinct (%) | 10.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 192136.9274 |
| Minimum | 0 |
|---|---|
| Maximum | 2123105347 |
| Zeros | 11173 |
| Zeros (%) | 0.5% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 17.0 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 7 |
| Q1 | 84 |
| median | 683 |
| Q3 | 7061 |
| 95-th percentile | 216818.2 |
| Maximum | 2123105347 |
| Range | 2123105347 |
| Interquartile range (IQR) | 6977 |
Descriptive statistics
| Standard deviation | 5910508.481 |
|---|---|
| Coefficient of variation (CV) | 30.76196002 |
| Kurtosis | 40807.05337 |
| Mean | 192136.9274 |
| Median Absolute Deviation (MAD) | 671 |
| Skewness | 169.9392771 |
| Sum | 4.279998004 × 1011 |
| Variance | 3.493411051 × 1013 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 3 | 16028 | 0.7% |
| 2 | 15881 | 0.7% |
| 4 | 15567 | 0.7% |
| 6 | 15134 | 0.7% |
| 5 | 15006 | 0.7% |
| 1 | 14905 | 0.7% |
| 7 | 14258 | 0.6% |
| 8 | 13527 | 0.6% |
| 10 | 13023 | 0.6% |
| 9 | 12613 | 0.6% |
| Other values (238720) | 2081635 |
| Value | Count | Frequency (%) |
| 0 | 11173 | |
| 1 | 14905 | |
| 2 | 15881 | |
| 3 | 16028 | |
| 4 | 15567 | |
| 5 | 15006 | |
| 6 | 15134 | |
| 7 | 14258 | |
| 8 | 13527 | |
| 9 | 12613 |
| Value | Count | Frequency (%) |
| 2123105347 | 1 | |
| 1793502218 | 1 | |
| 1704495994 | 1 | |
| 1682763021 | 1 | |
| 1666016612 | 1 | |
| 1645811582 | 1 | |
| 1621265491 | 1 | |
| 1616141394 | 1 | |
| 1494252350 | 1 | |
| 1446535469 | 1 |
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 2.1 MiB |
| True | |
|---|---|
| False | 42126 |
| Value | Count | Frequency (%) |
| True | 2185451 | |
| False | 42126 | 1.9% |
| Distinct | 1033 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.1020611807 |
| Minimum | 0 |
|---|---|
| Maximum | 400 |
| Zeros | 2185451 |
| Zeros (%) | 98.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 17.0 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 0 |
| 95-th percentile | 0 |
| Maximum | 400 |
| Range | 400 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 2.654431754 |
|---|---|
| Coefficient of variation (CV) | 26.00824072 |
| Kurtosis | 12503.47129 |
| Mean | 0.1020611807 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 99.08154382 |
| Sum | 227349.1386 |
| Variance | 7.046007939 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0 | 2185451 | |
| 0.99 | 11172 | 0.5% |
| 1.99 | 5412 | 0.2% |
| 2.99 | 3621 | 0.2% |
| 1.49 | 3570 | 0.2% |
| 4.99 | 2327 | 0.1% |
| 3.99 | 2260 | 0.1% |
| 2.49 | 2085 | 0.1% |
| 3.49 | 1205 | 0.1% |
| 9.99 | 809 | < 0.1% |
| Other values (1023) | 9665 | 0.4% |
| Value | Count | Frequency (%) |
| 0 | 2185451 | |
| 0.194824 | 2 | < 0.1% |
| 0.204735 | 1 | < 0.1% |
| 0.207889 | 8 | < 0.1% |
| 0.21122 | 1 | < 0.1% |
| 0.263326 | 1 | < 0.1% |
| 0.273542 | 1 | < 0.1% |
| 0.393585 | 1 | < 0.1% |
| 0.415779 | 2 | < 0.1% |
| 0.449011 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 400 | 1 | < 0.1% |
| 399.99 | 23 | |
| 394.99 | 2 | < 0.1% |
| 389.99 | 3 | < 0.1% |
| 384.99 | 1 | < 0.1% |
| 379.99 | 5 | < 0.1% |
| 374.99 | 1 | < 0.1% |
| 369.99 | 1 | < 0.1% |
| 365.99 | 1 | < 0.1% |
| 364.99 | 1 | < 0.1% |
Size
Real number (ℝ≥0)
| Distinct | 1647 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 19.17557466 |
| Minimum | 0.0032 |
|---|---|
| Maximum | 1500 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 17.0 MiB |
Quantile statistics
| Minimum | 0.0032 |
|---|---|
| 5-th percentile | 1.9 |
| Q1 | 4.9 |
| median | 10 |
| Q3 | 25 |
| 95-th percentile | 65 |
| Maximum | 1500 |
| Range | 1499.9968 |
| Interquartile range (IQR) | 20.1 |
Descriptive statistics
| Standard deviation | 23.97999704 |
|---|---|
| Coefficient of variation (CV) | 1.250549069 |
| Kurtosis | 93.7727556 |
| Mean | 19.17557466 |
| Median Absolute Deviation (MAD) | 6.6 |
| Skewness | 4.890497946 |
| Sum | 42715069.07 |
| Variance | 575.0402578 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 11 | 61873 | 2.8% |
| 12 | 55850 | 2.5% |
| 13 | 47819 | 2.1% |
| 14 | 45067 | 2.0% |
| 16 | 42251 | 1.9% |
| 15 | 41113 | 1.8% |
| 10 | 37484 | 1.7% |
| 17 | 37074 | 1.7% |
| 18 | 31534 | 1.4% |
| 19 | 29560 | 1.3% |
| Other values (1637) | 1797952 |
| Value | Count | Frequency (%) |
| 0.0032 | 1 | < 0.1% |
| 0.0033 | 1 | < 0.1% |
| 0.0034 | 1 | < 0.1% |
| 0.0046 | 1 | < 0.1% |
| 0.0047 | 3 | |
| 0.0051 | 1 | < 0.1% |
| 0.0053 | 1 | < 0.1% |
| 0.0058 | 1 | < 0.1% |
| 0.0061 | 2 | |
| 0.0062 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 1500 | 2 | < 0.1% |
| 1100 | 8 | |
| 1020 | 1 | < 0.1% |
| 1006 | 1 | < 0.1% |
| 1000 | 3 | < 0.1% |
| 996 | 1 | < 0.1% |
| 985 | 1 | < 0.1% |
| 981 | 1 | < 0.1% |
| 977 | 1 | < 0.1% |
| 963 | 1 | < 0.1% |
Minimum Android
Real number (ℝ≥0)
| Distinct | 23 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 4.324223809 |
| Minimum | 0 |
|---|---|
| Maximum | 8 |
| Zeros | 12324 |
| Zeros (%) | 0.6% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 17.0 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 2.3 |
| Q1 | 4.1 |
| median | 4.2 |
| Q3 | 5 |
| 95-th percentile | 6 |
| Maximum | 8 |
| Range | 8 |
| Interquartile range (IQR) | 0.9 |
Descriptive statistics
| Standard deviation | 0.92216513 |
|---|---|
| Coefficient of variation (CV) | 0.2132556433 |
| Kurtosis | 4.876242744 |
| Mean | 4.324223809 |
| Median Absolute Deviation (MAD) | 0.2 |
| Skewness | -0.2864179094 |
| Sum | 9632541.5 |
| Variance | 0.8503885271 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=23)
| Value | Count | Frequency (%) |
| 4.1 | 594105 | |
| 5 | 389792 | |
| 4.4 | 384164 | |
| 4 | 330833 | |
| 4.2 | 115123 | 5.2% |
| 6 | 88965 | 4.0% |
| 2.3 | 86088 | 3.9% |
| 5.1 | 58698 | 2.6% |
| 4.3 | 40532 | 1.8% |
| 7 | 33930 | 1.5% |
| Other values (13) | 105347 | 4.7% |
| Value | Count | Frequency (%) |
| 0 | 12324 | 0.6% |
| 1 | 308 | < 0.1% |
| 1.1 | 165 | < 0.1% |
| 1.5 | 2095 | 0.1% |
| 1.6 | 8523 | 0.4% |
| 2 | 3207 | 0.1% |
| 2.1 | 16681 | 0.7% |
| 2.2 | 23648 | 1.1% |
| 2.3 | 86088 | |
| 3 | 16997 | 0.8% |
| Value | Count | Frequency (%) |
| 8 | 13705 | 0.6% |
| 7.1 | 3036 | 0.1% |
| 7 | 33930 | 1.5% |
| 6 | 88965 | 4.0% |
| 5.1 | 58698 | 2.6% |
| 5 | 389792 | |
| 4.4 | 384164 | |
| 4.3 | 40532 | 1.8% |
| 4.2 | 115123 | 5.2% |
| 4.1 | 594105 |
| Distinct | 6 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 17.0 MiB |
| Everyone | |
|---|---|
| Teen | 187771 |
| Mature 17+ | 58137 |
| Everyone 10+ | 31460 |
| Unrated | 151 |
Length
| Max length | 15 |
|---|---|
| Median length | 8 |
| Mean length | 7.7718548 |
| Min length | 4 |
Characters and Unicode
| Total characters | 17312405 |
|---|---|
| Distinct characters | 23 |
| Distinct categories | 5 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | Everyone |
|---|---|
| 2nd row | Everyone |
| 3rd row | Everyone |
| 4th row | Everyone |
| 5th row | Everyone |
Common Values
| Value | Count | Frequency (%) |
| Everyone | 1949928 | |
| Teen | 187771 | 8.4% |
| Mature 17+ | 58137 | 2.6% |
| Everyone 10+ | 31460 | 1.4% |
| Unrated | 151 | < 0.1% |
| Adults only 18+ | 130 | < 0.1% |
Length
Histogram of lengths of the category
Category Frequency Plot
| Value | Count | Frequency (%) |
| everyone | 1981388 | |
| teen | 187771 | 8.1% |
| mature | 58137 | 2.5% |
| 17 | 58137 | 2.5% |
| 10 | 31460 | 1.4% |
| unrated | 151 | < 0.1% |
| adults | 130 | < 0.1% |
| only | 130 | < 0.1% |
| 18 | 130 | < 0.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 4396606 | |
| n | 2169440 | |
| r | 2039676 | |
| y | 1981518 | |
| o | 1981518 | |
| E | 1981388 | |
| v | 1981388 | |
| T | 187771 | 1.1% |
| 89857 | 0.5% | |
| 1 | 89727 | 0.5% |
| Other values (13) | 413516 | 2.4% |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 14725790 | |
| Uppercase Letter | 2227577 | 12.9% |
| Decimal Number | 179454 | 1.0% |
| Space Separator | 89857 | 0.5% |
| Math Symbol | 89727 | 0.5% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| e | 4396606 | |
| n | 2169440 | |
| r | 2039676 | |
| y | 1981518 | |
| o | 1981518 | |
| v | 1981388 | |
| t | 58418 | 0.4% |
| a | 58288 | 0.4% |
| u | 58267 | 0.4% |
| d | 281 | < 0.1% |
| Other values (2) | 390 | < 0.1% |
Uppercase Letter
| Value | Count | Frequency (%) |
| E | 1981388 | |
| T | 187771 | 8.4% |
| M | 58137 | 2.6% |
| U | 151 | < 0.1% |
| A | 130 | < 0.1% |
Decimal Number
| Value | Count | Frequency (%) |
| 1 | 89727 | |
| 7 | 58137 | |
| 0 | 31460 | 17.5% |
| 8 | 130 | 0.1% |
Space Separator
| Value | Count | Frequency (%) |
| 89857 |
Math Symbol
| Value | Count | Frequency (%) |
| + | 89727 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 16953367 | |
| Common | 359038 | 2.1% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| e | 4396606 | |
| n | 2169440 | |
| r | 2039676 | |
| y | 1981518 | |
| o | 1981518 | |
| E | 1981388 | |
| v | 1981388 | |
| T | 187771 | 1.1% |
| t | 58418 | 0.3% |
| a | 58288 | 0.3% |
| Other values (7) | 117356 | 0.7% |
Common
| Value | Count | Frequency (%) |
| 89857 | ||
| 1 | 89727 | |
| + | 89727 | |
| 7 | 58137 | |
| 0 | 31460 | 8.8% |
| 8 | 130 | < 0.1% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 17312405 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| e | 4396606 | |
| n | 2169440 | |
| r | 2039676 | |
| y | 1981518 | |
| o | 1981518 | |
| E | 1981388 | |
| v | 1981388 | |
| T | 187771 | 1.1% |
| 89857 | 0.5% | |
| 1 | 89727 | 0.5% |
| Other values (13) | 413516 | 2.4% |
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 2.1 MiB |
| True | |
|---|---|
| False |
| Value | Count | Frequency (%) |
| True | 1115394 | |
| False | 1112183 |
In App Purchases
Boolean
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 2.1 MiB |
| False | |
|---|---|
| True | 180990 |
| Value | Count | Frequency (%) |
| False | 2046587 | |
| True | 180990 | 8.1% |
Editors Choice
Boolean
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 2.1 MiB |
| False | |
|---|---|
| True | 699 |
| Value | Count | Frequency (%) |
| False | 2226878 | |
| True | 699 | < 0.1% |
Auto
The auto setting is an easily interpretable pairwise column metric of the following mapping: vartype-vartype : method, categorical-categorical : Cramer's V, numerical-categorical : Cramer's V (using a discretized numerical column), numerical-numerical : Spearman's ρ. This configuration uses the best suitable for each pair of columns.Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here. A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
First rows
| df_index | Category | Rating | Rating Count | Installs | Maximum Installs | Free | Price | Size | Minimum Android | Content Rating | Ad Supported | In App Purchases | Editors Choice | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | Adventure | 0.0 | 0.0 | 10 | 15 | True | 0.0 | 10.0 | 7.1 | Everyone | False | False | False |
| 1 | 1 | Tools | 4.4 | 64.0 | 5000 | 7662 | True | 0.0 | 2.9 | 5.0 | Everyone | True | False | False |
| 2 | 2 | Productivity | 0.0 | 0.0 | 50 | 58 | True | 0.0 | 3.7 | 4.0 | Everyone | False | False | False |
| 3 | 3 | Communication | 5.0 | 5.0 | 10 | 19 | True | 0.0 | 1.8 | 4.0 | Everyone | True | False | False |
| 4 | 4 | Tools | 0.0 | 0.0 | 100 | 478 | True | 0.0 | 6.2 | 4.1 | Everyone | False | False | False |
| 5 | 5 | Social | 0.0 | 0.0 | 50 | 89 | True | 0.0 | 46.0 | 6.0 | Teen | False | True | False |
| 6 | 6 | Libraries & Demo | 4.5 | 12.0 | 1000 | 2567 | True | 0.0 | 2.5 | 4.1 | Everyone | True | False | False |
| 7 | 7 | Lifestyle | 2.0 | 39.0 | 500 | 702 | True | 0.0 | 16.0 | 5.0 | Everyone | False | False | False |
| 8 | 8 | Communication | 0.0 | 0.0 | 10 | 18 | True | 0.0 | 1.3 | 4.4 | Teen | False | False | False |
| 9 | 9 | Personalization | 4.7 | 820.0 | 50000 | 62433 | True | 0.0 | 3.5 | 4.1 | Everyone | True | False | False |
Last rows
| df_index | Category | Rating | Rating Count | Installs | Maximum Installs | Free | Price | Size | Minimum Android | Content Rating | Ad Supported | In App Purchases | Editors Choice | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2227567 | 2312934 | Education | 0.0 | 0.0 | 5 | 6 | True | 0.0 | 3.6 | 4.0 | Everyone | True | False | False |
| 2227568 | 2312935 | Personalization | 0.0 | 0.0 | 1000 | 1302 | True | 0.0 | 29.0 | 4.1 | Everyone | True | False | False |
| 2227569 | 2312936 | Business | 0.0 | 0.0 | 100 | 353 | True | 0.0 | 21.0 | 5.0 | Everyone | False | False | False |
| 2227570 | 2312937 | Education | 0.0 | 0.0 | 5 | 7 | True | 0.0 | 6.6 | 4.4 | Everyone | False | False | False |
| 2227571 | 2312938 | Education | 3.4 | 17.0 | 1000 | 1980 | True | 0.0 | 10.0 | 4.1 | Everyone | True | False | False |
| 2227572 | 2312939 | Role Playing | 4.3 | 16775.0 | 100000 | 337109 | True | 0.0 | 77.0 | 4.1 | Teen | False | False | False |
| 2227573 | 2312940 | Education | 0.0 | 0.0 | 100 | 430 | True | 0.0 | 44.0 | 4.1 | Everyone | False | False | False |
| 2227574 | 2312941 | Education | 0.0 | 0.0 | 100 | 202 | True | 0.0 | 29.0 | 5.0 | Everyone | False | False | False |
| 2227575 | 2312942 | Music & Audio | 3.5 | 8.0 | 1000 | 2635 | True | 0.0 | 10.0 | 5.0 | Everyone | True | False | False |
| 2227576 | 2312943 | Trivia | 5.0 | 12.0 | 100 | 354 | True | 0.0 | 5.2 | 5.0 | Everyone | True | False | False |